Practical 7

Log-Gaussian Cox Processes

In this practical we will:

Libraries to load:

library(dplyr)
library(INLA)
library(ggplot2)
library(patchwork)
library(inlabru)     
library(spatstat)
library(sf)
library(scico)
library(spatstat)
library(lubridate)
library(terra)
library(tidyterra)

In this practical we consider the data clmfires in the spatstat library.

This dataset is a record of forest fires in the Castilla-La Mancha region of Spain between 1998 and 2007. This region is approximately 400 by 400 kilometres. The coordinates are recorded in kilometres. For more info about the data you can type:

?clmfires

We first read the data and transform them into an sf object. We also create a polygon that represents the border of the Castilla-La Mancha region. We select the data for year 2004 and only those fires caused by lightning.

data("clmfires")
pp = st_as_sf(as.data.frame(clmfires) %>%
                mutate(x = x, 
                       y = y),
              coords = c("x","y"),
              crs = NA) %>%
  filter(cause == "lightning",
         year(date) == 2004)

poly = as.data.frame(clmfires$window$bdry[[1]]) %>%
  mutate(ID = 1)

region = poly %>% 
  st_as_sf(coords = c("x", "y"), crs = NA) %>% 
  dplyr::group_by(ID) %>% 
  summarise(geometry = st_combine(geometry)) %>%
  st_cast("POLYGON") 
  
ggplot() + geom_sf(data = region, alpha = 0) + geom_sf(data = pp)  
Figure 1: Distribution of the observed forest fires caused by lightning in Castilla-La Mancha in 2004

Fit a homogeneous Poisson Process

As a first exercise we are going to fit a homogeneous Poisson process (HPP) to the data. This is a model that assume constant intensity over the whole space so our linear predictor is then:

\[ \eta(s) = \log\lambda(s) = \beta_0 , \ \mathbf{s}\in\Omega \]

so the likelihood can be written as:

$$

\[\begin{aligned} p(\mathbf{y} | \lambda) & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) \\ & = \exp\left( -\int_{\Omega}\exp(\beta_0)ds\right)\prod_{i=1}^n \lambda(\mathbf{s}_i) \\ \end{aligned}\]

$$

where \(|\Omega|\) is the area of the domain of interest.

We need to approximate the integral using a numerical integration scheme as:

\[ \approx\exp\left(-\sum_{k=1}^{N_k}w_k\lambda(s_k)\right)\prod_{i=1}^n \lambda(\mathbf{s}_i) \]

Where \(N_k\) is the number of integration points \(s_1,\dots,s_{N_k}\) and \(w_1,\dots,w_{N_k}\) are the integration weights.

In this case, since the intensity is constant, the integration scheme is really simple: it is enough to consider one random point inside the domain with weight equal to the area of the domain.

# define integration scheme

ips = st_sf(
geometry = st_sample(region, 1)) # some random location inside the domain
ips$weight = st_area(region) # integration weight is the area of the domain

cmp = ~ 0 + beta_0(1)

formula = geometry ~ beta_0

lik = bru_obs(data = pp,
              family = "cp",
              formula = formula,
              ips = ips)
fit1 = bru(cmp, lik)
Warning Task
  1. What is the estimated intensity?
  2. Compare the estimated expected number of fires on the whole domain with the observed ones.

Remember that in the inlabru framework we model the log intensity \(\eta = log(\lambda)\)

Code
# 1) The estimated posterior distribution of the  intensity is

post_int = inla.tmarginal(function(x) exp(x), fit1$marginals.fixed$beta_0)
post_int %>% ggplot() + geom_line(aes(x,y))

Code
# 2) To compute the expected number of points in the area we need to multiply the
# estimated intensity by the area of the domain.
# In the same plot we also show the number of observed fires as a vertical line.

post_int = inla.tmarginal(function(x) st_area(region)* exp(x), fit1$marginals.fixed$beta_0)
post_int %>% ggplot() + geom_line(aes(x,y)) +
  geom_vline(xintercept = dim(pp)[1])

Inhomogeneous Poisson Process

The model above has the clear disadvantages that assumes a constant intensity and from Figure Figure 1 we clearly see that this is not the case.

The library spatstat contains also some covariates that can help explain the fires distribution. Figure @fit-altitude shows the location of fires together with the (scaled) altitude.

#|
elev_raster = rast(clmfires.extra[[2]]$elevation)
elev_raster = scale(elev_raster)
ggplot() + geom_spatraster(data = elev_raster) + geom_sf(data = pp) + scale_fill_scico()
Figure 2: Distribution of the observed forest fires and scaled altitude

We are now going to use the altitude as a covariate to explain the variability of the intensity \(\lambda(s)\) over the domain of interest.

Our model is \[ \log\lambda(s) = \beta_0 + \beta_1x(s) \] where \(x(s)\) is the altitude at location \(s\).

The likelihood becomes: \[ \begin{aligned} p(\mathbf{y} | \lambda) & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) \\ & = \exp \left( -\int_\Omega \exp(\beta_0 + \beta_1x(s)) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) \\ \end{aligned} \] Now we need to choose an integration scheme to solve the integral.

In this case we will take a simple grid based approach where each quadrature location has an equal weight. Our grid consists of \(N_k = 1000\) points and the weights are all equal to \(|\Omega|/N_k\).

#|
n.int = 1000
ips = st_sf(geometry = st_sample(region,
            size = n.int,
            type = "regular"))

ips$weight = st_area(region) / n.int
ggplot() + geom_sf(data = ips, aes(color = weight)) + geom_sf(data= region, alpha = 0)
Figure 3: Integration scheme.

OBS: The implicit assumption here is that the intensity is constant inside each grid box, and so is the covariate!!

We can now fit the model:

cmp = ~ Intercept(1) + elev(elev_raster, model = "linear")
formula = geometry ~ Intercept + elev
lik = bru_obs(data = pp,
              family = "cp",
              formula = formula,
              ips = ips)
fit2 = bru(cmp, lik)
Warning Task

What is the effect of the altitude on the (log) intensity of the process?

You can look at the summary for the fixed effects

Code
fit2$summary.fixed
                mean         sd 0.025quant   0.5quant 0.975quant       mode
Intercept -6.6056709 0.10183751 -6.8052688 -6.6056709 -6.4060731 -6.6056709
elev       0.6697347 0.06915237  0.5341985  0.6697347  0.8052708  0.6697347
                   kld
Intercept 8.349015e-15
elev      0.000000e+00

⚠️ WARNING!!⚠️ When fitting a Point process, the integration scheme has to be fine enough to capture the spatial variability of the covariate!!

Warning Task

Rerun the model with the altitude as covariate, but this time change the integration scheme as follows:

n.int2 = 50

ips2 = st_sf(geometry = st_sample(region,
            size = n.int2,
            type = "regular"))
ips2$weight = st_area(region) / n.int2

What happens to the effect of the covariate?

Re-run the model chaging the integration scheme in the ips input of the bru_obs() function.

Code
lik_bis = bru_obs(data = pp,
              family = "cp",
              formula = formula,
              ips = ips2)

fit2bis = bru(cmp, lik_bis)

# you can the check the differences between the two models
# fit2$sumary.fixed
# fit2bis$sumary.fixed
Warning Task

Now we want to predict the log-intensity over the whole domain.

Use the grid from the elevation raster

est_grid = st_as_sf(data.frame(crds(elev_raster)), coords = c("x","y"))
est_grid  = st_intersection(est_grid, region)

to predict the intensity over the domain.

Use the predict() function.

Code
preds2 = predict(fit2, est_grid, ~ data.frame(log_scale = Intercept + elev,

                                              lin_scale = exp(Intercept + elev)))
# then visualize it like
#preds2$log_scale %>% ggplot() + geom_sf(aes(color = mean)) + scale_color_scico()

Finally, we want to use the fitted model to estimate the total number of fires over the whole region. To do this we first have to fine the expected number of fires as:

\[ E(N_{\Omega}) = \int_{\Omega}\exp(\lambda(s))\ ds \]

Then simulate possible realizations of \(N_{\Omega}\) to include also the likelihood variability in our estimate:

N_fires = generate(fit2, ips,
                      formula = ~ {
                        lambda = sum(weight * exp(elev + Intercept))
                        rpois(1, lambda)},
                    n.samples = 2000)

ggplot(data = data.frame(N = as.vector(N_fires))) +
  geom_histogram(aes(x = N),
                 colour = "blue",
                 alpha = 0.5,
                 bins = 20) +
  geom_vline(xintercept = nrow(pp),
             colour = "red") +
  theme_minimal() +
  xlab(expression(Lambda))

Log-Gaussian Cox Process

Finally we want to fit a LGCP with log intensity:

\[ \log(s) = \beta_0 + \beta_1x + u(s) \] where \(\beta_0\) is the intercept, \(\beta_1\) the effect of (standarized) altitude \(x(s)\) as before and \(u(s)\) is a Gaussian Random field defined through the SPDE approach.

Define the mesh

The first step, as any time we use the SPDE approach is to defie the mesh and the priors for the marginal variance and range:

mesh = fm_mesh_2d(boundary = region,
                  max.edge = c(5, 10),
                  cutoff = 4, crs = NA)

ggplot() + gg(mesh) + geom_sf(data = pp)

spde_model =  inla.spde2.pcmatern(mesh,
                                  prior.sigma = c(1, 0.5),
                                  prior.range = c(100, 0.5))

We can then define the integration weight. Here we use the same points to define the SPDE approximation and to approximate the integral in the likelihood. We will see later that this does not have to be like this, BUT integration weight and SPDE weights have to be consistent with each other!

ips = fm_int(mesh, samplers = region)

ggplot() + geom_sf(data = ips, aes(color = weight)) +
  gg(mesh) +
   scale_color_scico()

Run the model

Warning Task

Set up the components and the formula for the model above by completing the code below and run the model.

cmp = ~ ...

formula = geometry ~ ...

lik = bru_obs("cp",
              formula = formula,
              data = pp,
              ips = ...)

fit3 = bru(cmp, lik)

The model has three components: intercept, linear effect of altitude and the spatial GRF

Code
cmp = ~ Intercept(1) + space(geometry, model = spde_model) + elev(elev_raster, model = "linear")

formula = geometry ~ Intercept + space + elev

lik = bru_obs("cp",
              formula = formula,
              data = pp,
              ips = ips)

fit3 = bru(cmp, lik)

Note when running the model above you will get a warning:

Warning in bru_log_warn(msg): Model input 'elev_raster' for 'elev' returned some NA values.
Attempting to fill in spatially by nearest available value.
To avoid this basic covariate imputation, supply complete data.

It means that the bru() function cannot find the covariate values for some of the mesh nodes. This is a common situation. As the warning says, the bru() function automatically imputes the value of the covarite using the nearest nodes. This increases the running time of the bru() function, so one solution is to impute the values of the covariate over the whole mesh ‘before’ running the bru() function. This can be done using the code below:

# Extend raster ext by 30 % of the original raster so it covers the whole mesh
re <- extend(elev_raster, ext(elev_raster)*1.3)
# Convert to an sf spatial object
re_df <- re %>% stars::st_as_stars() %>%  st_as_sf(na.rm=F)
# fill in missing values using the original raster 
re_df$lyr.1 <- bru_fill_missing(elev_raster,re_df,re_df$lyr.1)
# rasterize
elev_rast_p <- stars::st_rasterize(re_df) %>% rast()
ggplot() + geom_spatraster(data = elev_rast_p) 

NOTE The bru_fill_missing() function was added mainly to handle very local infilling on domain boundaries. For properly missing data, one should consider doing a proper model of the spatial field instead.

Results

Warning Task

Plot the estimated mean and standard deviation of the spatial GF and the log-intensity over the domain of interest

Use the fm_pixels() and predict() functions.

Code
pxl = fm_pixels(mesh, mask= region, dims = c(200,200))
preds = predict(fit3, pxl, ~data.frame(spde = space,
                                       log_int = Intercept + space + elev))

#and plot as 
#library(scico)
#preds$spde %>% ggplot() + geom_sf(aes(color = mean)) + scale_color_scico() +
#  ggtitle("spde mean")
#preds$spde %>% ggplot() + geom_sf(aes(color = sd)) + scale_color_scico() +
#  ggtitle("spde sd")

# preds$log_int %>% ggplot() + geom_sf(aes(color = mean)) + scale_color_scico() +
#  ggtitle("log-int mean")
# preds$log_int %>% ggplot() + geom_sf(aes(color = sd)) + scale_color_scico() +
#  ggtitle("log-int sd")

Instead of just looking at the posterior mean and standard deviation, it can be uselull to look at simulated fields from the posterior distribution. This is because the mean field is, by definition, smoother than any realization of the field. So looking at simulation can give us a better idea of how the field might look like. We can do this using the generate() function:

sim_fields = generate(fit3, pxl, ~data.frame(spde = space,
                                       log_int = Intercept + space + elev),
                     n.samples = 4)

cbind(pxl,sapply(sim_fields, function(x) x$spde)) %>%
  pivot_longer(-geometry) %>%
  ggplot() + geom_sf(aes(color = value)) + 
  facet_wrap(.~name) + scale_color_scico() + ggtitle("simulated spatial fields")

cbind(pxl,sapply(sim_fields, function(x) x$log_int)) %>%
  pivot_longer(-geometry) %>%
  ggplot() + geom_sf(aes(color = value)) + 
  facet_wrap(.~name) + scale_color_scico() + ggtitle("simulated log intensity")